Automatic transcription of Somali language

نویسندگان

  • Abdillahi Nimaan
  • Pascal Nocera
  • Jean-François Bonastre
چکیده

Most African countries follow an oral tradition system to transmit their cultural, scientific and historic heritage through generations. This ancestral knowledge accumulated during centuries is today threatened of disappearing. Automatic transcription and indexing tools seem potential solution to preserve it. This paper presents the first steps of automatic speech recognition (ASR) of Djibouti languages in order to index the Djibouti cultural heritage. This work is dedicated to process Somali language, which represents half of the targeted Djiboutian audio archives. We describe the principal characteristics of audio (10 hours) and textual (3M words) training corpora collected and the first ASR results of this language. Using the specificities of the Somali language, (words are composed of a concatenation of sub-words called “roots” in this paper), we improve the obtained results. We also discuss future ways of research like roots indexing of audio archives. Index Terms : resource-poor languages, speech recognition, African languages, oral patrimony indexing

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preservation of african cultural heritage by automatic transcription of african languages

Most African countries follow an oral tradition system to transmit their cultural, scientific and historic heritage through generations. This ancestral knowledge accumulated during centuries is today threatened of disappearing. Automatic transcription and indexing tools seem potential solution to preserve it. This paper presents the first steps of automatic speech recognition (ASR) of Djibouti ...

متن کامل

Somali and the Nature of Morphophonological Alternations

Somali is an East Cushitic language spoken in the Horn of Africa, with determiner morphemes whose initial consonants undergo interesting phonological alternations when suffixed onto noun stems. This could be described as a type of “derived environment contact phenomena,” as the changes in these suffix-initial consonants are totally dependent on the final segment of the stem and are not active i...

متن کامل

The Pronunciation of Somali - Accented Swedish

The numbers of Somali immigrants in Sweden has increased during the last years and there is a need of more competence for teaching Swedish as a second language to this group.Star This paper investigates aspects of difficulties in the pronunciation of Swedish for Somali speakers. Recordings of two Somali speakers living in Sweden have been analyzed and it is obvious that there are pronunciation ...

متن کامل

Tonal Alternations and Prosodic Structure in Somali

This paper investigates the prosody of focalisation in Somali, a cushitic tonal accent language. The Somali nouns undergo many tonal accent alternations according to the discursive contexts. A primary aim of the study is to explain these alternations by assuming that they are triggered by intonative tones. However, very little attention has been devoted to the intonation of Somali. As Somali ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006